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One of the most challenging problems in recommender systems based on the collaborative filtering (CF) 
concept is data sparseness, i.e., limited user preference data is available for making recommendations. 
Cross-domain collaborative filtering (CDCF) has been studied as an effective mechanism to alleviate data 
sparseness of one domain by transferring knowledge about user preferences from other domains. However, 
there are two key issues that need to be addressed to make a CDCF approach successful: (a) what common 
characteristics can be used to establish a link between different domains and (b) how to get each domain 
effectively and efficiently benefit from such a link. In this paper, we propose a novel algorithm. General- 
ized Tag-induced Cross-domain Collaborative Filtering (GTagCDCF), that exploits user-contributed tags 
as common characteristics to link different recommender domains together. Formulated from the proba- 
bilistic point of view, GTagCDCF takes into account all the user-item relations, the user-tag relations and 
the item-tag relations from different domains, resulting in a substantially extended collective matrix fac- 
torization framework, in which the common tags take the role of effectively transferring the knowledge 
between different domains. GTagCDCF is also found to be efficient, since its complexity is linear in the 
number of observed relations among users, items and tags from all domains. Using publicly available CF 
datasets to represent three cross-domain cases, i.e., two two-domain cases and one three-domain case, we 
experimentally demonstrate that GTagCDCF substantially outperforms several state-of-the-art single do- 
main and cross-domain CF-based recommendation approaches. GTagCDCF is also shown to be effective for 
heterogeneous cross-domain cases, in which different domains are characterized by different types of user 
preferences. In addition, our investigation of the impact of user tagging behavior on GTagCDCF led to the 
conclusion that users can already benefit from GTagCDCF if they only share a few common tags. Finally, 
we validate the robustness of GTagCDCF with respect to the scale of datasets and the number of domains, 
based on a three-domain experiment. 
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1. INTRODUCTION 

Recommender systems have become ubiquitous to relieve internet users from informa- 
tion overload in online sites, such as Amazon^ YouTub^ Netflis[^ One of the most suc- 
cessful techniq ues in recommender systems is collaborative filtering [Adomavicius and 
Tuzhilin 2005], which is based on the concept that users w ho have s imilar interest in 
the past may still have similar preference in future [Resnick et al. 1994|. Howeve r, one 
of the most challenging problems in CF has been recognized as data sparseness [Ado-j 
[mavicius and Tuz hilin 2005; Cacheda et al. 2011; Herlocker et al. 2004 1, since most of 
the users in a recommender system do not rate (or give any kind of feedback to) many 
items, resulting in highly sparse user-item relations. Specifically to address the data 
sparseness problem, recent research in recommender systems has st arted to study the 
potential of cross-doma in collaborative filtering (CDCF) [Li et al. 2009a; Li et al. 2009b; 
[Li et al. 20TTl|Pan et al. 2010,.Pan et al. 2011,.Zhang et al. 2010|. The key idea of CDCF 
is to make use of rating (or user preference) data from different recommender domains 
in order to benefit recommendations of one domain from other domains. Conceptually, 
CDCF attains alleviation of data sparseness in one recommender domain, by means of 
transferring knowledge of user preference from other domains. 

Although the idea of CDCF is intuitively sound, there are several challenges that 
we have to face in order to develop a successful CDCF approach. First, different rec- 
ommender domains are usually mutually exclusive. Each domain involves a certain 
type of products, e.g., movies, music or books, which could be different from products 
in other domains, and each domain involves a certain set of users, which could also be 
different from users in other domains or hardly distinguishable from their usernames. 
As a result, it is difficult, or perhaps also undesirable, to directly link users or items 
from different domains. Second, different recommender domains may probably have 
different types of user preference feedback. For instance, one domain may encode user 
preference explicitly, such as ratings or scores, while another domain may only have 
user preference with implicit feedback, such as the number of clicks, the duration of 
staying in a site, etc. As a result, it is difficult to utilize those heterogeneous user 
preference data from different domains for mutual benefits. Third, even if common 
characteristics among different domains are available, it is still difficult to maximally 
benefit from those common characteristics for improving recommendation quality in 
each domain, while keeping the recommendation process scalable. 

In this paper, we propose a novel approach, generalized tag-induced cross-domain 
collaborative filtering (GTagCDCF), that specifically addresses the aforementioned 
challenges. We exploit user-generated tags that are common to different domains to 
link their users and items, in order to transfer the knowledge of user preference from 
one domain to another Tagging has become a ubiquitous function in most of today's 
recommender systems [Robu et al. 2009; Song et al. 20111, in which a user can an- 
notate an item with an arbitrary textual expression, i.e., a tag. Since a tag usually 
encodes a user' s feeling about an item and even users from different domains may 
use same tags | |Robu et al. 20 091, we infer that users from different domains may 
use same tags to express similar feeling to different domain items. GTagCDCF is for- 
mulated from probabilistic point of view that takes into account all of the user-item 
preference, the user-tag relations and the item-tag relations from different domains, 
resulting in a matrix factorization framework, from which all the latent features of 
users and items from different domains can be learned with tag-induced benefits from 
each other. GTagCDCF is a generalized framework for CDCF, since 1) it maximally 



^ http ://www. amazon. com/ 
^http://www. youtube.com/ 
http ://ww w. netflix. com/ 
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exploits the information of common tags for knowledge transfer among domains, 2) 
it suits most of the cross-domain situations, i.e., being able to tackle different sets of 
users and items in different domains. 

The novel contribution of this paper can be summarized as follows. 

— We propose a new cross-domain collaborative filtering framework that could exploit 
explicitly shared knowledge between different domains for mutual improvement of 
recommendation performance. 

— We experimentally demonstrate that the proposed GTagCDCF could significantly 
outperform several single-domain and cross-domain CF approaches. 

— We find that the proposed GTagCDCF could particularly benefit for users who are 
active in tagging. 

The remainder of the paper is structured as follows. In the next section, we sum- 
marize related work and position our approach with respect to it. We formalize the 
research problem and the terminology in section |3} followed by a detailed presenta- 
tion of GTagCDCF in section |4] In sectionlS] we present experimental evaluation of 
GTagCDCF on three cross-domain cases. The last section sums up the key aspects of 
GTagCDCF and briefly addresses the direction for future work. 

2. RELATED WORK 

This section briefly summarizes the existing related research in CF, CDCF and tag- 
aware recommendation, in order to position the approach we propose in this paper 

2.1. Collaborative Filtering 



CF approaches are usu ally catego rized as e ither memo ry-based or mo del-based | |Ado 



mavicius and Tuzhilin 2005; Bree se et al. 1 998; Herlocker et al. 2004|. A recent com- 
parative study of CF approaches can be found in [Cached a et al. 2 0Tl|. Depending 
on whether the recommendation for a user is aggregated from some other users with 
similar interest, or from items that are similar to those she already liked, memory- 
based CF approaches can be further categorized as either user-based collaborative 
filtering (UBCF) [Herlocker et al. 1999; Resnick et al. 1994 1 or ite m-base d collabo- 
rative filtering [Deshpande and Karypis 2004; Linden et al. 2003; Sarwar et al. 2001 1. 
Although various modification s have been made for memory-based approa ches, e.g. 



similarity fusio n [Wang et al. 2 006 1 and employing external knowledge [Um yarov and 
Tuzhilin 20111, the key drawback remains in the expensive computation for similari 



ties among all users or items, which could hardly scale with the extremely large num- 
bers of users and items in real-world recommender systems. Compared to memory- 
based approaches, model-based approaches, which do not rely on computing similar- 
ities among users or items, first use a training set of user-item preference data to 
learn a prediction model, then apply that model to generate recommendations. Con- 
ventional model-based CF approaches include Gaussian mixture model [K leinberg and| 
Sandler 2004[ [STand J in 20031 and latent semantic model [Ho fmann 200 4 1. Recently, 



matrix factorization (MF) techniques have attracted much research attention, due to 
the advantages of scalability an d accu racy, especially for lar ge-scale data, as exempli- 
fied by the Netflix contest [Koren et al. 2009; Paterek 2007; Salak hutdinov and Mnih| 
[20081 . Generally, MF techniques exploit the observed user preference data of a recom- 
mencler system to learn both latent user features and latent item features, which are 
further used to predict unknown user preference to items. Probabilistic matrix factor- 
ization (PMF) [Salakhutdinov and Mnih 20081 illustrates the rating factorization from 
a probabilistic point of view. Our work in this paper builds on the PMF concept, but 
goes beyond PMF due to the substantial difference between cross-domain scenarios 
and single-domain scenarios. 
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We notice that our work is closely related to the work of MF with multiple matrices. 
To the best o f our kno wledge, under the context of CF, collective matrix factorization 
(CMF) [Singh and Gordon 20081 is the first work that proposed to factorize multiple 
matrices related to either users or items in addition to user-item rating matrix, while 
another similar work was proposed a little earlier under the context of text classifi- 



cation | |Zhu et al. 2007| |. The proposed GTagCDCF in this paper can be viewed as an 
extended version of CMF The detailed analysis of the substantial difference between 



GTagCDCF and CMF will be presented in Section 4.5 after introducing the technical 
details of GTagCDCF. Recently, due to the availability of large amount data of social 
relationship and contextual information in recommender systems, researchers have 
proposed to improve recommendation by developing MF with multiple ma trices in- 
volving not only user-item rating matrix but also social-aware information [Ma et aLJ 
|2bll| or context-aware information [Shi et al. 2013 j. In addition, researchers have also 
proposed to model MF with multiple matrices from probabilistic point of view by ex- 
tending the PMF framework with additional priors [Adams et al. 2G10| [Porteous et aL] 
|2010 |. However, compared to the aforementioned work, our work in this paper is sub- 
stantially different, since we tackle the CDCF problem, in which a key challenge lies 
in the connection between different domains. 

2.2. Cross-domain Collaborative Filtering 

We notice that the most closely related work to our work is on the CDCF approaches . 
One of the earliest work on CDCF was done by Berkovsky et al. [Berkovsky et al. 20071, 
who deployed several mediation approaches for importing and aggregating user rating 
vectors from different domains. Recently, the research of CDCF has been influenced 
and benefitted from the progress of transfer learning [Pan and Yang 2010], a machine 
learning paradig m for sha ring knowledge among different domains. Coordinate sys- 
tem transfer [Pan et al. 2010 ! is proposed to first learn latent features of users and 
items from an auxiliary domain (which has relatively more user preference data), and 
then adapt them to a target domain (which has relatively less user preference data) 
in order to regularize the learning of latent features for the users and the items in 
the target domain. Further extension has also been proposed to exploit implicit user 
feedbacks rather than explicit user ratings to constitute the auxiliary domain [Pan] 
[it al. 201 1 j. However, these approaches require that either users or items are shared 
between the domains, which is, as already mentioned in Section[ll a condi tion not com- 
monly encountered in practical applications. Codebook transfer (CBT) [Li et al. 2009aJ 
and rating-matrix generative model (RMGM) [Li et al. 2009b| are proposed to transfer 
knowledge from an auxiliary domain by learning an implicit cluster-level rating pat- 
tern that could be shared to a target domain. Similarly, multi-domain CF is proposed to 
extend PMF in multiple do mains involving explicit user preference [Zhang et al. 201()J 
or implicit user feedbacks [Tang et al. 2011], by means of learning an implicit correla- 
tion matrix, which is supposed to link different domains for knowle dge transfe r. One 
of the latest work has adopted the CDCF framework of RMGM [Li e t al. 2009b[ to ad- 
dress the problem of dynamic CF [Li et al. 2011 1. Compared to all the aforementioned 
CDCF approaches, the proposed GTagCDCF in this paper is substantially different in 
that we exploit an explicit information source, i.e., the tag, between different domains 
for knowledge transfer, rather than relying on implicit domain correlations that are 
still mined from user preference data. In addition, GTagCDCF is different from CBT 
and RMGM, since GTagCDCF could tackle the case of domains with different types 
of user preference, while CBT and RMGM require that different domains should have 
the same rating scale. 
In our previous work, we have pr oposed tag-induced cross-domain collaborative fil- 



tering (TagCDCF) [Shi et al. 2011] to use common tags as bridges to link different 
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domains for improving CDCF. However, there are two specific drawbacks that needs 
to be addressed. First, TagCDCF links different domains based on the cross-domain 
similarities, which need to be computed by using a specific similarity function. How- 
ever, the choice of the similarity function might be sensitive to the recommendation 
performance, and might vary substantially for different user cases. For this reason, 
a good choice from a number of similarity functions usually requires costly empiri- 
cal investigation. In addition, calculating these similarities offline is still a computa- 
tionally expensive task in the case of large number of cross-domain users or items. 
Second, TagCDCF only exploits user-tag relations and item-tag relations with binary 
indicators, while the frequency of a user assigning a tag and the frequency of an item 
annotated by a tag are totally ignored. As a result, valuable information may lose dur- 
ing this dramatic simplification, leading to a suboptimal knowledge transfer by the 
common tags. The proposed GTagCDCF specifically addresses the two drawbacks of 
TagCDCF, as it does not require computing cross-domain similarities and does explic- 
itly take into account the frequency of user-tag relations and item-tag relations. For 
this reason, we refer to GTagCDCF as a generalized version of TagCDCF, since it suits 
more general use cases. 



2.3. Tag-aware Recommendation 

We also point out that our work in this paper is related to tag-aware recommenda- 
tion. Tags are exploited to improve recommendation by various means, e . g., to fuse 
tags into traditional user- based CF and item-based CF [Liang et al. 20ld| Senetal 



|2009'; Tso-Sutter et al. 20081, to incorporate tags into probabilistic late nt semantic 



model [Hofmann 2004] to unify user-item relations and item-tag relations ||Wang et al 



[2006; Wetzke r et al. 20 091, and to use tag-based user correlations as a regularization 
rbr PMF [Zhen et al. 20091, etc. More recently, another gr oup of state- of-the-art ap- 
proaches has employed tensor factorization techniques [Kolda and Sun 20081 for tag- 
aware recommendation. Under such approaches, item recommendations or tag recom- 
mendations are learned from the {user, tag, item} triplet/ternary data directly [Rendle] 
|and Schmidt-Thieme 2010; Symeonidis et al. 2010) . Given the advantages of using tags 
ror improving recommender systems, our work in this paper goes a step further that 
explores the potential of using tags for introducing mutual benefits between different 
recommender domains. 



3. PROBLEM STATEMENT AND TERMINOLOGY 
3.1. Problem Statement 

The research problem we study in this paper can be stated as: How to effectively exploit 
the common tags between different recommender domains for improving the quality 
of recommendations in each domain. Specifically, we illustrate this problem by a toy 
example as shown in Fig. [l] Supposing that Alice is a user in a movie recommender 
system/domain, and Bob is a user in a book recommender domain, the problem is to 
predict Alice's rating on Movie2 and Bob's rating on Bookl (or in other words, whether 
the movie recommender system should recommend Movie2 to Alice and whether the 
book recommender system should recommend Bookl to Bob). Note that in the example 
we use a solid line to denote a rating (e.g., scaled from 1 to 5), and a dashed line to 
denote a relationship between a user and a tag or a relationship between an item and 
a tag. For this example, we can see that the two domains may both suffer from data 
sparseness, since it is difficult to infer the user's preference solely based on the domain 
that the user exists in. However, there is potential to infer the user's preference based 
on the common tags "fun" and "ridiculous". As can be seen, Alice rated the Movie 1 
with the highest rating, and she also tagged Movie 1 with "fun". From this observation. 
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we may infer that an item tagged by "fun" tend to be favored by users. Based on this 
inference, we can infer that in the book domain, Bob might also be in favor of Bookl, 
which is tagged with "fun" by some other users. According to the same reasoning, we 
can infer that Movie2 may not be a good recommendation for Alice. Although this toy 
example in Fig. [l]is much simplified from real systems, it still demonstrates that there 
is great potential of mutually benefitting different recommender domains from the 
common tags. Our work in this paper is to develop a new algorithm that can effectively 
achieve the aim of cross-domain CF via common tags. 



3.2. Terminology and Notations 

In this subsection, we first define the terms used further in the paper. A user-item 
matrix in fcth (k = 1,2, . . . , K) domain is denoted as Rt''), which consists of users and 
Affc items. Note that in this paper, rC'' could be a matrix with either explicit ratings or 
implicit feedback data. We use R'^f to denote user i's preference score (derived from 
either explicit rating or implicit feedback) to item j in the fcth domain. A user-tag 
matrix in fcth domain is denoted as Ff^k)^ in which F^^''^ indicates the frequency of user 
i using tag l. Similarly, an item-tag matrix in fcth domain is denoted as F(^k), in which 
F^^'''^ indicates the frequency of item j annotated by tag i. Note that without loss of 
generality, the non-zero entries of rC'), F(Uk) and F('^k) are, respectively, normalized to 
be within the range [0,1], by means of dividing over the maximal value in each matrix. 
We adopt the convention of denoting the number of non-zero entries in a matrix A as 
|A|. In addition, we use to serve an indicator function, which gives l^- = 1 if A^j > 0, 
otherwise. || A||f denotes the Frobenius norm of matrix A. 



4. GENERALIZED TAG-INDUCED CROSS-DOMAIN COLLABORATIVE FILTERING 

In this section, we present the detail of generalized tag-induced cross-domain collab- 
orative filtering (GTagCDCF). We first introduce the graphical model of GTagCDCF, 
from which we demonstrate how tags bridge different domains. Then, we interpret the 
graphical model from probabilistic point of view, resulting in a matrix factorization 
framework that incorporates tags to benefit different domains in learning latent fea- 
tures of users and items. Finally, we present complexity analysis of GTagCDCF and 
summarize the main characteristics of GTagCDCF. 
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Fig. 2. Graphical model of GTagCDCF 



4.1. Graphical Model 

We present the graphical model of the proposed GTagCDCF approach, as shown in 
Fig. |2] in order to illustrate the relationships between different variables/factors from 
K domains. TJC'' denotes a d x A4. matrix, whose ith column, i.e., ul''\ represents a 
d-dimensional latent feature vector of user i in the k domain. vC') denotes a d x 
matrix, whose jth column i.e., vj''\ represents a d-dimensional latent feature vector 
of user j in the feth domain. Supposing there are L tags that are common (co-existing) 
to all the K domains, t denotes a dx L matrix, whose ith column, i.e., Ti represents a 
d-dimensional latent feature vector of tag i. Our objective is to estimate the unknown 
latent features of users and items, i.e., u'''' and vC'' (k=l,2. . . ,k), whic h can be then 
used to predict recommendations in each domain. Compared to PMF [Salakhutdinov 
[mid Mnih 2008], we can observe that in Fig. |2j the sub-graph that only involves U^^), 
vC*^' and rC') is equivalent to PMF in a single domain. The set of latent features of 
common tags, i.e., T, takes role of the central node that connects K domains. As can 
be seen, the latent features of users in one domain would influence those in other 
domains via the common tags, and so do the latent features of items. The strength 
of the mutual influence is reflected by the frequency of user-tag relations, i.e., F("k) 
and the frequency of item-tag relations, i.e., F('^k). in the following, we will present the 
detail of learning latent features of users and items based on this graphical model. 

4.2. Model Realization and Optimization 

In this subsection, we interpret the graphical model of GTagCDCF in Fig. |2] from a 
probabilistic point of view, and further derive the objective function of GTagCDCF. 
For notation convenience, we introduce some substitutions for the variables in the 
graphical model, as shown below: 

Y = {RC'^FfUkj^Ft^k)!;!; = l,2,...,K} 
X = {u(''',v(''',T|fc = 1,2, 

® = Wr, <^Fu ' "Fv ' ^Uk ' <^Vfc ,^T\k = 1,2,. . . ,K} 
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According to the probabilistic graphical model theory | Bishop 2006) , the joint probabil- 
ity distribution of all the variables in the graphical model can be factorized as below: 

p(X,Y,0) (1) 
p{T\aT)p(&) 

Applying product rule to the left side of Eq. i.e., p(X,Y,0) = p(X|Y,©)p(Y,©), 
and neglecting constant prior probabilities, i.e., priors that have no influence on latent 
features u(^\ v(^\ u(^\ v(^), and t, we obtain the conditional probability distribution 
of latent features as below: 

p(X|Y,©) (2) 
n pCRC'' lUt") , VC') , afl)p(F(U>') luC') , T, ap^ )p{F^^^^ | Vt") , T, ap^ )p(U('') \au, jpCVC') ) 



p(T|<7t) 

As can be seen in Eq. (|2|, the latent features of users, items and tags can be esti- 
mated by the principle of maximum a posteriori, if we interpret all the conditional 
probabilities in the right side of Eq. ^ properly. In the following, we will present in- 
terpretations of all t he conditional probabilities bas ed on insights from probabilistic 
matrix factorization | |Salakhutdinov and Mnih 2008| . 

By using the conditional independence of individual ratings given latent features of 
users and items, the conditional probability distribution over observed ratings in the 
fcth domain is defined as: 

Mfc „(fc) 

pcrmiuw.vc),.^) = n n " (3) 

i=i j=i 

where Af{x\fj.,a^) denotes the probability density function for a Gaussian distribution 
with mean value ii and variance a^. g{x) is a logistic function, i.e., g{x) = i/(i + e^^), 
which is used to map the inner product of latent user features and latent item features 
to be within the range [0,1]. 

Similarly, we also define the conditional probability distributions over observed user- 
tag frequency and observed item-tag frequency in the fcth domain as in Eq. Q and in 
Eq. ([5]l, respectively. 

p(F(u.)|u(>^),T,<x^„) = n n ^^0,4,)] (4) 



p(F(v.)|vw,T,a^j = n n ^^^0,4,)] (5) 

Finally, we represent latent features of users and items in e ach domain, and l atent 
features of tags with zero-mean spherical Gaussian priors [Tip ping and Bishop 1 999 1. 
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1=1 

L 

p(T|aT) = n-^(^i|0,4A) 



(6) 
(7) 
(8) 



where A is a x d identity matrix. 

Substituting Eq. (|3}^ into Eq. (|2|, we obtain the posterior as below: 



p(X|Y,0) 

K Mf, N, 



k=li-lj=l i=li=l 

i = l i = l i = l " " 

L 



i=i 



(9) 



As mentioned before, the latent features of users, items and tags can be estimated 
by maximizing this posterior as in Eq. (|9]l. Notice that it is also equivalent to achieve 
the maximization of the posterior by minimizing the negative log-posterior, which is 
shown below: 



logp(X|Y,©) 

fc=i j=i 

K iVfc L 

+ EEE 

fc=i j=i 1=1 

K iVfc 



(10) 



1_ fflC") fj^(k) 



1 jF^^k) fj^iVk) 



L 



■EEE 

fc=i i=i i=i 



2(T 



1 /p,((7fc) (fc)T„. 



fe=l i=l ^"^C/fc 



k=lj=l ^'^Vk 

in which 



-E- 



2- l^i ^i; 



i = l '"T 



K Affc L 



il 



c = EEECi°gv/^-« + EEEA^ 

*; = li=lj = l fe = l i = l i = l 

K if 
+ ^ dMfc log V2ncrir^ + ^ dAffc log v^cry-^, + dL log v^CTy 



^--.. + EEEAr^': 

fc=l J=l (=1 



/2^aF^ (11) 



Note that c is irrelevant to latent features \J^^'> , vC') , and t, meaning it has no influence 
on optimizing these latent features towards an optimal solution. In order to further 
simplify the model, we also assume that prior variance of all the latent features are 
the same, i.e., afj = = = c. Taking the above conditions into Eq. (10 1, we obtain 
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the final objective function of GTagCDCF as G(x): 



G(X) 

K Mfc JVfc 

fc=i j=i 

K iVfe L 



(12) 



/3 



EEE 



W7(Vfe) / (fc) 



EEE 

fc=l i=i ; = i 
if 



E(l|U""lll™ + l|V('')|||,,J + ||T|| 



fc=i j=i (=1 

where we have set substitutions as q = crj^/crp^, P = crj^/aj,^,, and A = ct\/(t'^^. a is a 
regularization parameter that penalizes the magnitude of latent features in order to 
alleviate over-fitting, a and /? are regarded as tradeoff parameters, which control the 
relative influence from the user-tag matrices in K domains and the item-tag matrices 
in K domains, respectively. We emphasize that in the objective function the second 
term reflects that latent features of users from K domains are associated through la- 
tent features of shared tags, and the third term reflects that latent features of items 
from K domains are also associated through latent features of shared tags. Therefore, 
the proposed GTagCDCF bridges different domains by exploiting the common tags that 
have relationships with both users and items from different domains. 

The objective function in Eq. \12) is not jointly convex to all the variables of latent 
features, i.e., u(^), v(^), u^^), v'^), and t. We choose to apply gradient descend with 
respect to one of these variables alternatively and keep all the other variables fixed, 
in order to attain a local minimum solution for the objective function. Specifically, the 
gradients with respect to each variable can be computed as below: 



dG 



au. 



(k) 



j=i 



^ g'{u'l*'^'^Ti)Ti 



dG 



dV, 



ik) 



(k) 



1,2. 



(13) 



(14) 



AV". 



(fc) 



1,2. 



,K 



dG 

of, 



K Mk 

fc=i 1=1 



fc=ii=i 



F(^'=>'(v;«^Toy,«l+AT, 



(15) 



For the consideration of better readability, we summarize the algorithm of 
GTagCDCF in Algorithm [l] 

4.3. Complexity Analysis 



The complexity of the objective function of GTagCDCF in Eq. p2| l is 

0{dY.^.{\R.'-^^+\F''^'^'^\+\F'y^'>\+Mk+Nk)+dL). The Complexity of computing eacn gra- 
dient in Eq. dlsUlS} is 0(dEfc{|R-''''l+|F("'')|+A4,)), o{dEfe(|R-''''l+|F(^'')|+iVj^)), and 
0(dJ]fc(|F("''^|+|Fr^|)+(iL), respectively. Considering that in reality we usually have 
conditions as |R(k)|>> Mk,Nh, |F(Uk)|>> a4,l, and |F("^k)|>> Nh,L, the total complexity 
of GTagCDCF is in the order of 0(Efc(|R''''l+|F(Uk)|+|F(Vk)|)), which is linear in the 
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ALGORITHM 1: GTagCDCF 



Input: Normalized user-item preference matrix R*''' (fc=l,2. . . ,K), normalized user-tag 

frequency matrix f'^''^ (fc=l,2. . . ,K), normalized item-tag frequency matrix f'^"^' 
(fc=l,2. . . ,K), tradeoff parameters a, (3, regularization parameter A, stop condition e. 

Output: Predicted user-item preference matrix R^''^ (fc=l,2. . . ,K). 

Initialize uC^'l"', V*'')'''! (fc=l,2. . . ,K) and t'"! with random values; 

t = 0; 

Compute G'*' as in Eq. 



repeat 

r?= 1; 



12 I 



Compute and as in Eq. (|13 



repeat 

r) — r)/2; II maximize learning step size 
until G(uWW-r,j^,V(>')W - 
U(k,[t+ii^U(>^)[t,„^_aG_. 

Y{k)[t+i] ^ ^(v.m „ aa . 



15 i 



.[t] 



'I arm ' 



Compute G^'+^l as in Eq. 1 12 i; 

/ = I-G'^+i'/gW; 
t = t + 1; 
until / < e; 

R^{k) ^ u('')WTv('')W (fc=l,2. . . ,K); 



total number of the known preference scores in rC') , the known user-tag relationships 
in F(Uk), and the known item-tag relationships in F(^k) from all the domains. This 
analysis indicates that GTagCDCF is appropriate for large scale use cases. 

4.4. A Compact Formulation 

It is worth to mention that we can present the objective function of GTagCDCF in a 
compact form, which is mathematically equivalent to Eq. ( |12[ ). The compact formula- 
tion is expressed below: 



-I Mo No I- 

j=i 1=1 



9{UfV,)] 



Mo L 



[imilro + mlro)- 



l|2 1 
UFroi 



(16) 



where Mq = A/i + A/2 + . . . + Af^ , and No = N1+N2 + . . .+Nic ^ is a block diagonal matrix that 
contains all the preference data from K domains; F*"') is a stacked matrix that contains 
the user-tag relations from all the users of K domains; and fC') is a stacked matrix 
that contains the item-tag relations from all the items of K domains. Specifically, these 
variables are shown below: 







"f(Ui)- 




"f(Vi)" 


R(2) 




F(U2) 


, f(v) = 


F(V2) 
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F(Uk) 




F(Vk) 



(17) 
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u(2) 


, v = 


V(2) 






V(K) 



Correspondingly, u and V are stacked matrices that contain the latent factors of users 
and items, respectively, from K domains, as shown below: 



(18) 



We consider that the compact formulation of GTagCDCF could help practitioners to 
implement the algorithm more easily. 

4.5. Discussion 

We finalized our presentation of the proposed GTagCDCF approach with a brief sum- 
mary and discussion of the ways in which the algorithm addresses the specific chal- 
lenges faced in CDCF. 



(1) 



Explicit Knowledge Transfer. GTagCDCF utilizes explicitly shared knowledge, 
in the form of common tags, to introduce mutual benefit between domains. Com 



pared to related work ||Li et al. 2009al |Li et al. 2009bl |Pan et al. loTol [Pan 



(2) 



|et al. 20111, in which implicit correlations are mined from user preference data, 
we emphasize that the characteristic of explicit knowledge transfer could make 
GTagCDCF more effective in connecting users and items from different domains, 
leading to an improved recommendation performance in each domain. 
Heterogeneous Knowledge Transfer. GTagCDCF is suitable for cross-domain 
cases with heterogeneous user preference data. GTagCDCF can be widely applied 
in practice, since most recommender systems may have different mechanisms to 
allow users e xpressing their preference to items. Compared to the related work jLj] 
et al. 2009a| |Li et al. 2009b I, in which the CDCF approaches can only tackle diT^ 



ferent domains with the same rating scale, we emphasize that GTagCDCF could 
enjoy the advantage of heterogeneous knowledge transfer, since the latent features 
of common tags provide a basis for learning the latent features of users and items 

with different types of preference data from different domains. 

(3) Generalized Knowledge Transfer. Compared to our previous work fShi et alj 
|£011|, GTagCDCF is a generalized version of TagCDCF in that it directly exploits 
user-tag relations and item-tag relations, while not relying on any particular sim- 
ilarity measurement to capture the relations between users or items from differ- 
ent domains. In addition, we emphasize that GTagCDCF provides a generalized 
framework to utilize any explicit common characteristics (not only common tags) 
for introducing mutual benefits between recommender domains. 

The ability of GTagCDCF to face these challenges is gratifying, but not entirely 
surprising, given the success of single-domain approaches to PMF that exploit the 
simultaneous factorization o f multiple matrices. As m entioned in Section [2| collective 
matrix factorization (CMF) [Singh and Gordon 2008) is such an approacn, proposed 
to factorize multiple matrices of metadata/attributes related to users and items. For 
a better understanding of the difference between GTagCDCF and CMF, we illustrate 
the graphical model of CMF under the same variable representations as used in Fig. |3] 
Compared with Fig. [2] it can be seen that the original CMF is a single-domain model, 
which learns the latent factors based on the rating data and the metadata (comparable 
to the tags in this paper) within one domain. In contrast, GTagCDCF is a cross-domain 
model, in which the latent factors from one domain could influence those in another 
domain based on the shared metadata (the common tags in this paper). Note that in 
case it would be appropriate to conflate individual user/item sets into one large set, 
cross-domain recommendation can also be expressed in the CMF framework. More 
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Fig. 3. Graphical model of CMF 



deeply, CM F is essentially an extended version of PMF | Adams et al. 2010 Porteous 
|et al. 20101, specializing in incoporating side information with the factorization of the 
preference data. As mentioned in Section |2| GTagCDCF also builds itself on PMF, 
while specializing in incoporating side information from multiple domains with the 
factorization of the preference data. 

In sum, GTagCDCF is designed to be effective for addressing challenges in CDCF to 
improve recommendation performance in different domains. In the remainder of the 
paper, we present our extensive experimental analysis that demonstrates the strength 
of GTagCDCF 



5. EXPERIMENTAL EVALUATION 

In this section, we present the experiments we conducted to evaluate the proposed 
GTagCDCF. We first give a detailed description of the datasets that are used in our ex- 
periments, and the experimental setup. Then, we investigate the impact of the trade- 
off parameters of GTagCDCF and the effectiveness of GTagCDCF. Finally, we demon- 
strate the performance of GTagCDCF by comparison with several CF and CDCF ap- 
proaches. The research questions that need to be answered through the experiments 
are formulated as follows: 

(1) Could GTagCDCF be effective for introducing mutual benefit between different 
recommender domains so as to improve recommendations in each domain? 

(2) Could GTagCDCF outperform single-domain CF approaches? 

(3) Could GTagCDCF outperform other state-of-the-art CDCF approaches by exploit- 
ing the common tags between domains? 

(4) How would GTagCDCF perform for users with different rating profiles and differ- 
ent tagging behavior? 

(5) How about the robustness of GTagCDCF in terms of the scale of datasets and the 
number of involved domains? 



5.1. Datasets 

Our experiments are organized as two parts: one for the two-domain case and the other 
for the three-domain case. For the two-domain case, the experiments are conducted on 



ACM Journal Name, Vol. x, No. x, Article xx, Publication date: March xxxx. 



xx:14 



Y. Shi, M. Larson, and A. Hanjalic 



two pairs of publicly available datasets. The first pair of data sets consists of one subset 
from MovieLens 10 milli on datase tp] [Herl ocker et al. 1999] and one subset from Li- 
braryThing dataset|^[ Clements et al. 2010]. The original MovieLens dataset contains 
10 million ratings from 71576 users and 10681 movies, and in the subset we select the 
first 5000 users and 5000 movies according to the identifiers in the original dataset. 
The original LibraryThing dataset contains ca. 750 thousand ratings from 7279 users 
and 37232 books, and in the subset we also select the first 5000 users and 5000 books. 
This subset selection was necessary, since some baselines are too computationally ex- 
pensive to tackle larger datasets. Note that our choice of the subset selection procedure 
rather than random selection also ensures future experimental reproducibility. In the 
following, the two used subsets are denoted as MLl (from MovieLens) and LT (from 
LibraryThing). Both MLl and LT have 5-star rating scale with half star increments, 
representing a case that two different domains both have explicit ratings. In addition 
to ratings, the two domains also have 2277 common tags. Our target of experimenting 
on this pair of datasets is to investigate the effectiveness of our proposed GTagCDCF 
for benefitting different recommender domains that are both based on explicit rat- 
ings. For notation convenience, we refer to this pair of datasets as PI in the following. 
The second pair of datasets are from rece nt initiatives on information heterogeneity 
and fusion in recommender system^ [Cantador et al. 2011], where three datasets are 
provided with various types of user preferences and resources from three different do- 
mains, i.e., movie, webpage and music. Two datasets are used in our work. One dataset 
is an extended version of MovieLens 10 million dataset]^ with additional public infor- 
mation from IMDSFJand Rotten Tomatoe^ websites. In this following, we denote this 
dataset as ML2. MTi2 contains ca. 850 thousand ratings (with the same scale of MLl 
as mentioned before) from 2113 users and 10197 movies. The other dataset is collected 
from Last.fnp] online music system, mainly consisting of user-artist listening infor- 
mation, i.e., the frequency that a user listened to songs from an music artist. In the 
following, we denote this dataset as LF. LF represents a dataset from implicit-feedback 
recommender systems, where no explicit user preferences are expressed. LF contains 
ca. 93 thousand user-artist listening relations, i.e., listening counts, from 1892 users 
and 17632 artists. In addition, there are in total 996 tags that are common to the two 
domains. Compared to PI, our target of experimenting on this pair of datasets is to 
investigate the effectiveness of our proposed GTagCDCF for benefitting different rec- 
ommender domains that have different types of user preferences. We also refer to this 
pair of datasets as P2 in the following of the paper The statistics of all the datasets 
involved in the two-domain experiments are summarized in Table |l] 

For the three-domain case, the experiments are conducted by using the entire Movie- 
Lens 10 million dataset (denoted as ML-all, representing the movie domain,), the en- 
tire LibraryThing dataset (denoted as LT-all, representing the book domain) and the 
LF dataset (representing the music domain). Apart from the 2277 common tags be- 
tween the ML-all dataset and the LT-all dataset as shown in Table |I] there are 843 
common tags between the ML-all dataset and the LF dataset, and 1152 common tags 
between the LT-all dataset and the LF dataset. 



http://www.grouplens.org/node/73 
^http://dmirlab.tudelft.nl/users/maarten-clements 
^http://ir.ii.uam.es/hetrec2011/datasets.html 
^http://www.grouplens.org/node/462 
* http ://ww w. imdb .com 
^ http ://ww w. rottentomatoes. com 
^"http://www.last.fm 
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Table I. Statistics of datasets in P1 and P2. 





PI 




P2 






MLl 


LT 


ML2 


LF 


#users 


5000 


5000 


2113 


1892 


#items 


5000 


5000 


10109 


17632 


#preferences 


584628 


179419 


855598 


92834 


sparseness 


97.70% 


99.30% 


96% 


99.70% 


#common tags 


2277 


996 


#relations between users and common tags 


559 


157932 


2551 


9402 


#relations between items and common tags 


10778 


97277 


4016 


28267 



Note that in the following evaulation, our focus is on the two-domain case for the 
purpose of demonstrating the usefulness of the GTagCDCF model, while the experi- 
ments of the three-domain case only serve to validate the robustness of GTagCDCF to 
larger scale datasets and more than two domain use cases. 



5.2. Experimental Setup 

5.2.1. Experimental Protocol for Two-Domain Case. The experimental prot ocol we used in 
this work for th e two-domain case is similar to some recent CDCF work ]Li et al. 2009a 
Li et al. 2009bJ . In our experiments, we split each of all the datasets in both PI and P2 
into three sets containing different users, i.e., a training set, a validation set and a test 
set. For each dataset, the training set contains 60% randomly selected users and the 
users' preferences (ratings or listening counts) to items (movies, books or artists). The 
validation set contains 20% randomly selected users and their preferences to items. 
The test set contains the remaining 20% users and their preferences to items. The val- 
idation set is used to investigate the impact of different parameters in the proposed 
GTagCDCF algorithm and also tune parameters for all the baseline approaches. The 
test set is used to evaluate the performance of GTagCDCF and compare it with other 
baselines. For the purpose of investigating the performance of GTagCDCF for users 
with different rating profiles, we hold out preferences of users in the test set to be 
predicted, according to different conditions of user profile length (UPL). For example, 
under the condition of UPL=5, we use 5 randomly selected preferences for each user 
in the test set. The user profiles of length 5 are taken together with the training set 
as training data, and we use the remaining preferences of the users in the test set 
for evaluation. In our experiments, we evaluate three different conditions of UPL, i.e., 
UPL=5, 10, 15. For each condition of UPL, we generate 10 test data folds (each fold 
consisting of known preferences and holdout preferences) by random selection, and we 
report performance as an average across all of the 10 folds. In addition, in order to 
guarantee that each user in the test set has sufficient holdout preferences to be eval- 
uated, we filter out users in the test set who have less than 20 preferences. Therefore, 
even for the condition of UPL=15, we still have at least 5 items for each user in the 
test set for evaluation. 



5.2.2. Experimental Protocol for Three-Domain Case. As mentioned in Section 5.1 the pur- 
pose of the experiments for the three-domain case is only for validating the robustness 
of GTagCDCF. For this reason, we simply evaluate the performance of GTagCDCF 
by cross-validation without detailed examming different user profiles. Specifically, for 
each of the three datasets, we split it into 5 disjoint folds (i.e., each fold contains 20% 
preference data of the whole corresponding dataset.), from which we randomly select 
one for tuning the parameters in GTagCDCF and in the baseline approaches, and use 
the other four folds for cross-validation. The reported performance is averaged across 
all the four folds of each dataset. 
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5.2.3. Evaluation Metric. To be consistent with the majority of recent relate d work on 
CDCF rLi et al. 2009a^ 'Li et al. 2009bl |Pan et al. 20T0l [Pan et al. 20U\ , we adopt 



mean absolute error (MAE) as the evaluation metric for measuring recommendation 
performance on rating-based recommender domains, i.e., MLl and LT in PI and ML2 
in P2. Specifically, the definition of MAE is expressed as below: 

MAE= ' R^:i\/\TE\ (19) 

in which Te denotes the set of user-item pairs whose ratings need to be predicted, 
and \Te\ denotes the number of user-item pairs in the set. Rtj denotes the predicted 
rating for user i to item j, and Rij the corresponding ground truth. A lower value MAE 
indicates a better recommendation performance. 

For measuring recommendation performance on LF dataset, whose domain only has 
implicit user feed backs, i.e ., listening counts, we choose to use mean average precision 
(MAP) [H erlocker et al. 2004J , a evaluation metric that is widely used to evaluate the 
quality of a ranking list. Specifically, MAP is defined as below: 

m^p = ^Y:*(^!M^1^ (20) 

where Khe is the number of users for evaluation, and Ni denotes the number of rec- 
ommended items (i.e., artists in LF dataset) for the user i. rekij) is a binary indicator, 
which is equal to 1 if the item of rank j is relevant to user i, and is equal to other- 
wise. Pi@j is the precision of the top j recommended items for the user i, i.e., the ratio 
of movies in the top j recommendation that are relevant to the user i. Higher values 
of MAP indicate a better recommendation performance. Note that in order to deter- 
mine the relevance of each artist to a given user in the test set, we set a threshold 
of listening counts for each test user proportionate to the maximal listening count in 
her holdout set of feedbacks (listening counts to artists). The proportion coefficient for 
the threshold is set to 0.7 in our work. For example, if a holdout set of feedbacks for 
a test user has a maximal value as 1000, i.e., the maximal number of listening counts 
for that user to the artists that are to be ranked is 1000, then we regard the artists 
who are listened no less than 700 (i.e., 0.7x 1000) times by the user as relevant ones. 
Although other strategies can be considered to determine the relevance of artists, they 
would not influence the result of our comparative study in the experiments as long 
as they lie in the assumption that the user would prefer one artist to the other if she 
listend to the one more than the other. In addition, note that our evaluation is only 
based on the artists who were listened by a given test user The artists that were not 
listened by the user are not taken into account in the recommendation list, since there 
is no ground truth of user preference to tho se artists. Readers who have interest in 
this issue could further refer to the work of | Cremonesi et al. 2010t McLaughlin and 



Herlocker 20041. For this reason, our evaluation on LF dataset is under a conservative 



manner, which could have underestimated the power of recommendation approaches. 



as shown later in section|5]5] However, this issue would not influence our comparative 
experimental evaluation in the following experiments, where we are interested in the 
relative strength of different approaches. 

5.2.4. Parameter Setting. In the following experiments, we set the dimensionality of 
latent feat ures in GTagCDCF to be 10. We notice that, just like in a common MF 
technique [Weimer et al. 2008|, the performance of GTagCDCF did not substantially 



cha nge w hen further increasing d, while more computational cost is required (cf sec- 
tion |4.3| l. The regularization parameter A of GTagCDCF is set to 0.01 for the two- 
domain case of PI, 0.001 for the two-domain case of P2, and 0.01 for the three-domain 
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0.00001 0.0001 0.001 0.01 0.1 1 10 

Tradeoff parameter alpha 
(a) MLl 




0.00001 0-0001 0.001 0.01 0.1 1 

Tradeoff parameter alpha 
(b) LT 



Fig. 4. The impact of tradeoff parameter a on the recommendation performance of GTagCDCF for the two 
domains in PI. 




0.00001 0.0001 0.001 0.01 0.1 1 

Tradeoff parameter alpha 
(a) ML2 




0.00001 0-0001 0-001 O.OI 0.1 1 

Tradeoff parameter alpha 
(b) LF 



Fig. 5. The impact of tradeoff parameter a on the recommendation performance of GTagCDCF for the two 
domains in P2. 

case. The choice of A for each case is based on our observation of the performance of 
GTagCDCF on the corresponding vahdation sets/folds. The impact of d and A on recom- 
mend ation perfor mance has been widely i nvestigated in rel ated work part icularly on 
MF [ Keren et al. 2 009; Salakhutdi nov and Mnih 2008; Wei mer et al. 2008| . In the fol- 
lowing, we will focus on the impact of two tradeoff parameters a and which represent 
the key characteristics of GTagCDCF. 

5.3. Impact of Tradeoff Parameters 

The tradeoff parameters a and /3 in the proposed GTagCDCF algorithm influence the 
relative contributions from the cross-domain user-tag relations and the cross-domain 
item-tag relations. By using the validation set together with the training set in each 
of PI and P2, we investigate the impact of the tradeoff parameters on GTagCDCF 
by varying their values and measuring the recommendation performance in terms of 
MAE on MLl, LT and ML2 datasets and MAP on LF dataset. Note that in the experi- 
ments of this section, we set the condition of UPL=5 for the users in the validation set, 
while measuring recommendation performance according to the holdout set of ratings 
or implicit feedbacks as mentioned in the previous subsection. We first set /? = o in 
Eq. ( [I2] l and investigate the impact of a, as shown in Fig. |4 and Fig. [5| It can be seen 
that for the cross-domain case of PI, the optimal value of alies around 0.1 in terms of 
MAE for both MLl domain and LT domain. In addition, it can also be seen that there is 
an optimal value of q, which lies around 1, in terms of MAE in ML2 domain and MAP 
in LF domain, when GTagCDCF is used for the case of P2. This observation indicates 
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0.00001 0.0001 0.001 0.01 0.1 

Tradeoff parameter beta 
(a) MLl 



^ 0.750 



0.00001 0.0001 0.001 0.01 0.1 1 

Tradeoff parameter beta 
(b) LT 



Fig. 6. The impact of tradeoff parameter /3 on the recommendation performance of GTagCDCF for the two 
domains in PI. 




0.00001 0.0001 0.001 0.01 0.1 1 10 

Tradeoff parameter beta 




O.OOOOI 0.0001 



(a) ML2 

Fig. 7. The impact of tradeoff parameter / 
domains in P2. 



Tradeoff parameter beta 
(b) LF 



on the recommendation performance of GTagCDCF for the two 



that by exploiting the common tags from different domains, GTagCDCF could bene- 
fit for improving latent user features in each domain, thus, resulting in an improved 
recommendation performance. 

By adopting the optimal value of « = o.i for the cross-domain case of PI, we further 
investigate the impact of tradeoff parameter I3 in Eq. ( 12 1 on the recommendation per- 
formance of GTagCDCF. We also further investigate the impact of tradeoff parameter 
p on GTagCDCF for the cross-domain case of P2, with a fixed to the optimal value 1. 
The results are shown in Fig. [6 and Fig. |7l As can be seen in Fig. [6j the optimal value 
of /3 lies around 0.1 in terms of MAE forboth domains in PI. In addition, it can be 
also seen in Fig. [7] that the optimal value of /? lies around 1 in terms of MAE in ML2 
domain and MAP in LF domain. These results lead to an indication that by exploiting 
the common tags from different domains, GTagCDCF could also benefit for improving 
latent item features in each domain in addition to latent user features, thus, resulting 
in a further improved recommendation performance in each domain. 

Based on our observations of the impact of tradeoff parameters as studied before, we 
could infer a positive answer to our first research question. 

5.4. Effectiveness 

In the second set of our experiments, we investigate the effectiveness of GTagCDCF, 
i.e., the effect of the minimization of the objective function in Eq. \V2) on the improve- 
ment of the recommendation performance in different domains. Note that these ex- 
periments are also based on the valiation sets under the condition of UPL=5, and the 
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Fig. 8. The effectiveness of GTagCDCF in improving the recommendation performance (MAE or MAP) in 
different domains. 



tradeoff parameters are adopted with the optimal values for Pland P2, respectively, 
as observed from the previous subsection. To this end, we demonstrate the variation 
of the output of the objective function and evaluation metrics, i.e., MAE for MLl, LT 
and ML2, and MAP for LF on the validation sets, simultaneously during the iterations 
of the optimization process. The results are shown in Fig. [8] As can be seen, when 
minimizing the objective function of GTagCDCF, the recommendation performance in 
each domain (for both PI and P2) improves along with algorithm iterations, nearly ap- 
proaching to an convergence after 100 iterations. This observation allows us to confirm 
a positive answer to our first research question. 



5.5. Performance Comparison in the Two-domain Case 

In this subsection, we compare the performance of the proposed GTagCDCF with a set 



of alternative recommendation approaches listed below. As mentioned in section 5.2 



the performance is reported based on the test set with 10 randomly separated folds. 
The tradeoff parameters are the optimal ones determined using the validation set as 
stated in section [5?3| i.e., a = o.i, p = o.i for MLl and LT in PI, and a = i, /? = i for ML2 
and LF in P2. 



— UBCF: User-based collaborative filtering [Herlocker et al. 19991 is used as a rep- 
resentative of memory-based CF approaches. The neighborhood size is tuned to 50 
according to our observation of its performance on the validation sets. 

— PMF: Probabilistic matrix factorization [ .Salakhutdinov and Mnih 2008| is a state- 
of-the-art model-based CF approach. The regularization parameter A is tuned based 
on the validation set of each dataset, i.e., 0.01 for MLl and LT, and 0.001 for ML2 
and LF. Note that both UBCF and PMF are CF approaches for a single-domain use 
case. 

— CBT: Codebook transfer | Li et al. 2009al represents a state-of-the-art cross-domain 
CF approach. For the two domains in PI, we use one domain (e.g., MLl) as the 
auxiliary domain, which is used to construct a codebook, and the other domain (e.g., 
LT) as the target domain in whic h the recom mendations are generated. Following 
the experimental protocol used in | Li et al. 200 9a I, we select 500 users and 500 items 
with most rating to construct the auxiliary domain, and set the number of clusters 
to 50 for both users and items. Note that CBT is not applicable for the case of P2, 
since it requires that two domains share a same r ating scale. 

— RMGM: Rating-matrix generative model I Li et al. 2009b | represents another state- 
of-the-art cross-domain CF approach. The implementation of RMGM is based on the 
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publicl y available code package [^supplied by the authors. As suggested in | Li et al. 
|2009b1, we set the number of both the user and the item clusters to be 20. Similar 
to CBT, RMGM is not applicable for the case of P2, since it also requires that two 
domains share a same rating scale. Note that in both CBT and RMGM, the related 
parameters are also tuned based on the performance measured from the validation 
sets, as used in GTagCDCF. 
— TagCDCF: Tag-induced cross-domain collaborative filtering is from our previous 
work [Shi et al. 20111, which represents a cross-domain CF approach with explicit 
knowledge between domains. The difference between TagCDCF and GTagCDCF has 



been discussed in section 4.5 Note that all the parameters involved in TagCDCF 
are tuned by the same means as in GTagCDCF according to observations from the 
validation sets. 



The results of the comparative analysis are shown in Table [11] and Table III from 



which we have several observations regarding the proposed algorithm GTagCDCF as 
summarized below. 



First, as can be seen in both Table [11] and Table III GTagCDCF significantly outper 



forms single-domain CF approaches, i.e., UBCF and PMF, to a large extent. Note that 
the significance of improvement is measured according to Wilcoxon signed rank signif- 
icance test with p<0.005. For the two domains in PI, GTagCDCF improves over UBCF 
by 7~8% on MLl and 14-19% on LT, and over PMF by up to 8.5% on MLl and up to 
10% on LT. Similar amount of improvement can also be observed in the cross-domain 
case of P2. In addition, we notice that other cross-domain CF approaches, i.e., CBT 
and RMGM, also consistently outperform single-domain approaches. In view of this 
result, we can conclude that GTagCDCF, as a CDCF approach, succeeds in benefiting 
recommendations in different domains by exploiting the common knowledge between 
domains. This observation allows us to confirm a positive answer to our second re- 
search question. 

Second, we can observe from Table [!![ that GTagCDCF also significantly outperforms 
CDCF approaches, i.e., by 3~4% and ca. 3.5-4.5% over CBT on MLl and LT, and by i.e., 
by 1.5-4.5% and ca. 3.5-7.5% over RMGM on MLl and LT Note that TagCDCF also 
achieveed substantial improvements over CBT and RMGM. The results indicate that 
exploiting explicit common knowledge between domains could be more effective than 
relying on implicit common patterns between domains for the purpose of CDCF, veri- 
fying that GTagCDCF succeeds in introducing added value through common tags com- 
pared to existing CDCF approaches. In addition, GTagCDCF outperforms TagCDCF 
significatly as well. The relative improvement amounts to, for the case of PI, 1.4-2.0% 
on MLl and 0.5-2.5% on LT in terms of MAE, and for the case of P2, 2.3-5.6% on 
ML2 in terms of MAE and 6.8-14.7% on LF in terms of MAP. This result indicates 
that GTagCDCF succeed in introducing the frequency of user-tag relations and the 
frequency of item-tag relations for maximally benefitting from common tags for im- 
proving recommendations in each domain. Those observations allow us to give a posi- 
tive answer to our third research question. 

Fourth, the improvement achieved by GTagCDCF is consistent for users with differ- 
ent profiles as shown in all the domains of both PI and P2. In addition, the results also 
demonstrate that the performance of GTagCDCF generally improves as the number of 
ratings in the user profile increases, indicating that GTagCDCF not only benefits from 
the common tags between domains, but also sufficiently exploiting rating data from 
both domains. Although most CF approaches have this characteristic, we emphasize 
that it is important to have recommender systems in a healthy cycle where users are 



^^http://sites.google.com/site/libin82cn/ 
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Table II. Comparison of recommendation performance between GTagCDCF and the baseline approaches on datasets in P1. 





UPL=5 

MLl (MAE) LT (MAE) 


UPL=10 
MLl (MAE) LT (MAE) 


UPL=15 
MLl (MAE) LT (MAE) 


UBCF 
PMF 
CBT 
RMGM 
TagCDCF 
GTagCDCF 


0.833±0.009 0.857±0.009 
0.831±0.010 0.771±0.009 
0.792±0.009 0.729±0.010 
0.780±0.010 0.745±0.010 
0.777±0.008 0.709±0.004 
0.761*±0.008 0.691*±0.004 


0.785±0.004 0.795±0.005 
0.771±0.010 0.761±0.006 
0.752±0.003 0.694±0.008 
0.756±0.006 0.720±0.004 
0.735±0.004 0.674±0.003 
0.721*±0.004 0.667*±0.002 


0.766±0.003 0.771±0.004 
0.753±0.010 0.769±0.010 
0.737±0.003 0.682±0.006 
0.720±0.004 0.681±0.005 
0.719±0.003 0.663±0.002 
0.709* ±0.003 0.659* ±0.003 


Note: "*" denotes a significant improvement of GTagCDCF over all the other approaches, according to Wilcoxon signed 
rank significance test with p<0.005. 

Table III. Comparison of recommendation performance between GTagCDCF and the baseline approaches on datasets in P2. 




UPL=5 

ML2 (MAE) LF (MAP) 


UPL=10 
ML2 (MAE) LF (MAP) 


UPL=15 
ML2 (MAE) LF (MAP) 


UBCF 
PMF 
TagCDCF 
GTagCDCF 


0.803±0.010 0.134±0.008 
0.766±0.009 0.151±0.009 
0.748±0.009 0.162±0.010 
0.707*±0.007 0.176*±0.008 


0.761±0.004 0.152±0.011 
0.714±0.003 0.162±0.007 
0.686±0.003 0.175±0.011 
0.655*±0.002 0.201*±0.008 


0.742±0.004 0.156±0.009 
0.668±0.004 0.168±0.012 
0.653±0.003 0.190±0.013 
0.638*±0.003 0.203*±0.013 



Note: "*" denotes a significant improvement of GTagCDCF over all the other approaches, according to Wilcoxon signed 
rank significance test with p<0.005. 



Table IV. Performance comparison for users with different number of used com- 
mon tags in LT of P1. 



# common tags (# users) 


CBT 


RMGM 


TagCDCF 


GTagCDCF 


<5 (20) 


0.736 


0.716 


0.728 


0.712 


5~10 (62) 


0.683 


0.681 


0.644 


0.642 


11~20 (113) 


0.671 


0.678 


0.659 


0.652 


20~50 (193) 


0.707 


0.702 


0.693 


0.684 


>50 (175) 


0.665 


0.664 


0.641 


0.639 



encouraged to contribute more in order to attain better recommendations. The analy- 
sis positively answers our fourth question concerning the effect of user rating profiles 
on GTagCDCF. 

Finally, we investigate the impact of the number of common tags, that serve as an 
explicit information source, on transferring knowledge between domains. This investi- 
gation is done by a closer look at the comparison between the CDCF approaches with 
tags, i.e., TagCDCF and GTagCDCF, and the CDCF approaches without tags, i.e., CBT 
and RMGM. We compare their performance on the users with different number of used 
common tags, as shown in Table IV Note that we conduct this experiment only on the 
test set users of LT dataset under the condition of UPL=15, since the number of tags 
that the users assigned in MLl varies in a much narrower range than LT so that it 
can hardly provide a convincing observation. As can be seen, for the users who used 
less than 5 common tags, the performance of TagCDCF and GTagCDCF is not obvi- 
ously better (or even worse) than CBT or RMGM. However, users who used more than 
5 common tags are able to benefit from TagCDCF and GTagCDCF, the performance of 
which becomes superior to CBT and RMGM. According to this observation, we can con- 
clude that the common tags, serving as an explicit information source, could be more 
effective than implicit patterns mined from user preference for the purpose of trans- 
ferring knowledge, and users who are active in tagging could particularly benefit from 
the proposed approach. We also notice that the number of users who used less than 5 
common tags is not the majority, indicating the threshold of the used common tags for 
GTagCDCF to be beneficial is not difficult to reach in practice. This analysis provides 
our answer to the fourth research question concerning the user tagging behavior. 
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Table V. Comparison of recommendation performance between PIVIF, 
SVD++ and GTagCDCF on datasets in the three-domain case. 





ML-all (MAE) LT-all (MAE) LF (MAP) 


PMF 
SVD++ 
GTagCDCF 


0.610±0.007 0.656±0.002 0.416±0.005 
0.604±0.004 0.652±0.001 
0.596*±0.001 0.644*±0.001 0.430*±0.003 



Note: "*" denotes a significant improvement of GTagCDCF over all 
the baselines, according to Wilcoxon signed rank significance test 
with p<0.005. 



5.6. Performance Comparison in the Three-domain Case 



As mentioned in Section 5.2 for validating the robustness of GTagCDCF to larger 
datasets and use cases with more than two domains, we evaluate GTagCDCF in the 
three-domain case. The tradeoff parameters are tuned to, q = lo and /3 = i, based on 



the performance measured on the randomly selected fold, as described in Section 5.2 



Then, the performance of GTagCDCF is measured via 4-fold cross-validation. Due to 
the relatively large size of the involved datasets, some baselines used in the two- 
domain case, as in the previous subsection, are too computational expensive to be 
deployed, such as UBCF and TagCDCF. In addition, the baselines CBT and RMGM 
in the previous subsection are not proposed for the use cases with more than two 
domains. For these reasons, only two single-domain baselines are employed for per- 
formance comparison. One is P MF, which is the same as described in the previous 
subsection. The other is SVD++ Ij Koren 2008) , which is also a state-of-the-art CF ap- 
proach renowned from the Netflix Prize Competition. The implementation of SVD++ 
in our expe riments is based on the publicly available software MyMediaLite [GantneT] 



et al. 2011 1. The dimensionality of the latent factors in both of the baselines is set to 10, 
the same as for GTagCDCF. The related parameters in each of the baselines are also 
tuned bas ed on the performance measured on the randomly selected fold, as described 
in Section |5?2l 

The results are shown in Table [V| from which we can observe that GTagCDCF still 
achieves significant improvement over PMF and SVD++ in each individual dataset, 
i.e., ca. 2% improvement on the ML-all and the LT-all datasets interms of MAE, and 
ca. 3% improvement on the LF dataset. Note that we did not find that SVD++ could 
generate reasonable result on the implicit feedback dataset, LF, thus, the correspond- 
ing entry was left empty. The observation from the experimental results indicates that 
GTagCDCF could be sufficiently robust for the use cases of large datasets and multiple 
domains, a positive answer to our last research question. 



6. CONCLUSION AND FUTURE WORK 

In this paper, we present a new approach, GTagCDCF, to address the cross-domain 
collaborative filtering problem. GTagCDCF exploits user-contributed tags as explicit 
links to connect users and items from different recommender domains, which allows 
for knowledge transfer between different domains in order to attain mutual benefit. 
GTagCDCF is designed as a generalized framework that incorporates all the relations 
among users, items and tags in different domains into a learning process, in which 
the latent features of users and items in each domain are enhanced by the knowledge 
transferred via the latent features of common tags from other domains. In addition, 
GTagCDCF is analyzed to have the advantage of being suitable for both homogeneous 
and heterogeneous cross-domain cases, and of being computationally scalable at the 
same time. 

Based on two cross-domain cases, a homogeneous one and a heterogeneous one, 
our experimental results demonstrate that GTagCDCF could substantially outperform 
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several state-of-the-art single-domain and cross-domain CF approaches in both cases. 
This improvement is consistent for users with various rating profiles. We also show 
that GTagCDCF, as a generalized version, could substantially improve over TagCDCF, 
due to its ability to comprehensively utilize the information induced from common 
tags. In addition, we investigate the impact of user tagging behavior on the benefit 
that the users can attain from GTagCDCF and find that GTagCDCF could already 
become beneficial for those users who used just a few common tags, and it could be 
particularly beneficial for those who are active in tagging. Finally, we validate the ro- 
bustness of GTagCDCF via the evaluation of its performance in the three-domain case. 

For our future work, we would like to extend the GTagCDCF framework to exploit 
the common "context", rather than only tags, to further improve the mutual benefit be- 
tween different recommender domains. Context has been recognized as a critical factor 
for recommender systems [Adomavicius et al. 2011 1. Apart from exploiting contextual 
information for improving a particular recommender system, we would also use some 
types of contextual information that might be common to different recommender do- 
mains to establish cross-domain links. It would be interesting to investigate effective 
mechanisms to address the cross-domain collaborative filtering problem in the case of 
multiple types of cross-domain links. 
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